Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

lapp0 · 2024-09-17T21:43:00Z

Problem

Logits processors can't be reused across multiple generation runs and must be copied. SequenceGeneratorAdapter wasn't respecting this requirement. The result is that in generator = generate.choice(...), the generator can only be used once.

During inference, logits processors are called with a sequence of input_ids + output_ids. Since structured generation only applies to output_ids, we track the length of input_ids on the first call and treat subsequent tokens as output_ids. However, this approach fails when the input_ids sequence changes.

Solution

copy the logits processors for each SequenceGeneratorAdapter.__call__(...), ensuring they can correctly determine the start of output_ids.

Further Work

Logits processors requires more documentation. Especially since vLLM contributors have discussed replacing Outlines logits processors implementation with import outlines.processors.

lapp0 marked this pull request as ready for review September 17, 2024 22:08

Don't re-use logits processors in SequenceGeneratorAdapter, copy them

a09944c

lapp0 force-pushed the fix-multiple-choice-calls branch from 8059393 to a09944c Compare September 22, 2024 05:38

rlouf added the run-benchmarks label Sep 22, 2024

rlouf merged commit 77c6d67 into dottxt-ai:main Sep 23, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

lapp0 commented Sep 17, 2024 •

edited

Loading

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

Conversation

lapp0 commented Sep 17, 2024 • edited Loading

Problem

Solution

Further Work

lapp0 commented Sep 17, 2024 •

edited

Loading